Association between lifestyle activities and cognitive function across adult lifespan

APS 2014: Poster presentation

A growing body of literature suggests that a wide array of lifestyle activities are associated with better cognitive function and reduced risk for age-related neurodegenerative disorders.

However, a great deal of studies on lifestyle and cognition primarily involve an elderly population in a controlled laboratory setting.

Use of Internet- and mobile application-based technology for data collection (Killingsworth & Gilbert, 2010; Lee et al., 2012; Nosek et al., 2009) has allowed researchers to collect a large amount of data with cultural and regional diversity.

With data collected from the iPad application (BrainBaseline), we examined the association between various lifestyle activities and cognitive fucntion across adult lifespan.

In [1]:
import pandas as pd
import numpy as np
import seaborn as sns
import matplotlib.pyplot as plt
import pylab as pl
from collections import Counter
from scipy import stats 
from IPython.core.display import Image 
sns.set_style("whitegrid")
sns.set_context("talk")
%pylab inline
Populating the interactive namespace from numpy and matplotlib

In [4]:
Image('Brainbaseline/APS_2014_Lee.png')
Out[4]:

Brainbaseline, the application used to collect data

In [5]:
Image('Brainbaseline/bb.png') 
Out[5]:

I. Data cleaning and preparation

We performed a Principal Components Analysis (PCA) on survey information and categorized lifestyle factors into three categories: physical activity, leisure activity, and socioeconomic status.

Let's load data with lifestyle composite scores and bin it by age range.

In [6]:
bb = pd.read_csv("Brainbaseline/bb_all.csv", sep=",", skipinitialspace=True)
bb = bb.dropna(how='all')
bb = bb[['age', 'ageBin', 'exerciseScore', 'leisureScore', 'socioEconomicScore', 'memoryComposite', 'processingSpeedComposite_r']]

bb = bb[(bb["age"] >= 20) & (bb["age"] < 80)]

ageBin = bb['age']//10
ageBin[ageBin>6] = 6

bb['ageBin'] = ageBin

We will also bin leisure, exercise and socioeconomic scores in each age bin for later analysis.

In [7]:
whichScore = ['leisureScore', 'exerciseScore', 'socioEconomicScore']

age_dict = {i: bb[bb['ageBin'] == i] for i in range(2,7)}

def binning (score, field):
    median = score[field].dropna().quantile(.50)
    #cats = pd.qcut(data, 4)
    score[field+'_Bin'] = score[field].dropna().map(lambda x: 'high' if x > median else 'low') 
    
for s in whichScore:
    for i in range(2, 7):
        binning(age_dict[i], s)    
  
bb_new = pd.DataFrame()

for i in range(2, 7):
    bb_new = bb_new.append(age_dict[i])
    
print(Counter(bb_new['leisureScore_Bin']))
print(Counter(bb_new['exerciseScore_Bin']))
print(Counter(bb_new['socioEconomicScore_Bin']))
Counter({nan: 8840, 'low': 7604, 'high': 7139})
Counter({nan: 9537, 'low': 7194, 'high': 6852})
Counter({'low': 12729, 'high': 7712, nan: 3142})

Rename columns with more intuitive names.

In [8]:
bb_new.rename(columns={'leisureScore_Bin': 'leisureBin', 'exerciseScore_Bin': 'exerciseBin', 'socioEconomicScore_Bin': 'socioEconomicBin', 'memoryComposite': 'memory', 'processingSpeedComposite_r': 'processingSpeed'}, inplace=True)

Let's view the first five participants to check what we have in our dataframe.

In [9]:
bb_new.head()
Out[9]:
age ageBin exerciseScore exerciseBin leisureScore leisureBin memory processingSpeed socioEconomicScore socioEconomicBin
0 26 2 NaN NaN NaN NaN -0.24963 -1.59615 NaN NaN
2 28 2 NaN NaN NaN NaN NaN -0.37637 7 high
3 27 2 NaN NaN NaN NaN 2.84299 -1.31115 7 high
4 28 2 NaN NaN NaN NaN 3.29501 -5.17310 4 low
5 29 2 NaN NaN NaN NaN 0.17403 -6.88460 6 high

5 rows × 10 columns

II. Explore data

We have about 24,000 users' data. Let's check age distribution.

In [10]:
plt.hist(bb_new['age']);

plt.xlabel('Age')
plt.ylabel('Frequency')
Out[10]:
<matplotlib.text.Text at 0x10d155e10>

Then, replicate age-related cognitive decline in each cognitive function.

In [11]:
sns.factorplot("ageBin", "memory", data=bb_new)
sns.factorplot("ageBin", "processingSpeed", data=bb_new);

And then regress the lifestyle activities to cognitive function, after controlling the age effect. There are positive linear patterns except exercise score on memory function.

In [12]:
f1, (ax1, ax2, ax3) = plt.subplots(1, 3, sharey=True)
sns.regplot("leisureScore", "memory", bb_new, x_partial="age", order = 1, ax=ax1)
sns.regplot("exerciseScore", "memory", bb_new, x_partial="age", order = 1, ax=ax2).set_ylabel('')
sns.regplot("socioEconomicScore", "memory", bb_new, x_partial="age", order = 1, ax=ax3).set_ylabel('')
ax3.set(xlim=(0, 10), ylim=(-10, 5));
f1.tight_layout()

f2, (ax1, ax2, ax3) = plt.subplots(1, 3, sharey=True)
sns.regplot("leisureScore", "processingSpeed", bb_new, x_partial="age", order = 1, ax=ax1)
sns.regplot("exerciseScore", "processingSpeed", bb_new, x_partial="age", order = 1, ax=ax2).set_ylabel('')
sns.regplot("socioEconomicScore", "processingSpeed", bb_new, x_partial="age", order = 1, ax=ax3).set_ylabel('')
ax3.set(xlim=(0, 10), ylim=(-10, 5));
f2.tight_layout()

When exmained the lifestyle activity effect on cognitive function in each age bin, we see generally similar patterns regardless of age range.

In [13]:
sns.factorplot("ageBin", "memory", "leisureBin", bb_new, kind="point", palette="Set1");
sns.factorplot("ageBin", "memory", "exerciseBin", bb_new, kind="point", palette="Set1");
sns.factorplot("ageBin", "memory", "socioEconomicBin", bb_new, kind="point", palette="Set1");
In [14]:
sns.factorplot("ageBin", "processingSpeed", "leisureBin", bb_new, kind="point", palette="Set1");
sns.factorplot("ageBin", "processingSpeed", "exerciseBin", bb_new, kind="point", palette="Set1");
sns.factorplot("ageBin", "processingSpeed", "socioEconomicBin", bb_new, kind="point", palette="Set1");

There is additive benefit on cognitive function with diverse lifestyle activities (activity level is calculated by summing three lifestyle, with 1 for high activity 0 for low activity).

In [15]:
bb_new['activityLevel'] = bb_new['exerciseBin'].map(lambda x: 1 if x =='high' else 0) + bb_new['leisureBin'].map(lambda x: 1 if x =='high' else 0)  + bb_new['socioEconomicBin'].map(lambda x: 1 if x =='high' else 0) 
sns.factorplot("activityLevel", "memory", data=bb_new);
sns.factorplot("activityLevel", "memory", 'ageBin', data=bb_new);
sns.factorplot("activityLevel", "processingSpeed", data=bb_new);
sns.factorplot("activityLevel", "processingSpeed", 'ageBin', data=bb_new);

And there are interactions between lifestyle activites.

In [16]:
sns.factorplot("socioEconomicBin", "memory", "leisureBin", bb_new, kind="bar", palette="Set1");
sns.factorplot("socioEconomicBin", "processingSpeed", "exerciseBin", bb_new, kind="bar", palette="Set1");

The benefit of leisure and exercise activities is more salient in low socioeconomic status group.

III. Conclusion

  1. Active people tend to be smarter than less active people across all age range.
  2. There is additive benefit from diverse lifestyle activities.
  3. There might be interactions between socioeconomic status and other lifestyle actitives on the cognitive function.

IV. Cautionary Remarks

  1. The data is biased to younger adults. Therefore high variance among older adults.
  2. It is an observational study, only show correlation but causality.

Questions or comments, email to psykyu@gmail.com